Grammars Have Exceptions

نویسندگان

  • Valter Crescenzi
  • Giansalvatore Mecca
چکیده

| Extending database-like techniques to semi-structured and Web data sources is becoming a prominent research eld. These data sources are essentially collections of textual documents. Hence, in this context, one of the key tasks consists in wrapping documents to build database abstractions of their content that can be manipulated using high-level tools. However, the degree of heterogeneity and the lack of structure make standard grammar parsers excessively rigid, and often unable to capture the richness of constructs in these documents. This paper presents Minerva, a formalism for writing wrappers around Web sites and other textual data sources. The key feature of Minerva is the attempt to couple the beneets of a declarative, grammar-based approach, with the exibility of procedural programming. This is done by enriching regular grammars with an explicit exception-handling mechanism. Contributions of the paper stand in the deenition of the formalism, and in the description of its implementation, which relies on a number of ad-hoc techniques for parsing documents, among which an extension of the traditional LL(1) policy based on dynamic tokenization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Complexity of Context − Free Grammars with Exceptions

This report has been submitted forr publication outside of ITC and will probably be copyrighted if accepted for publication. It has been issued as a Technical Report forr early dissemination of its contents. In view of the transfert of copy right too the outside publisher, its distribution outside of ITC priorr to publication should be limited to peer communications and specificc requests. Afte...

متن کامل

Complexity of Context - free Grammars with Exceptionsand

This report has been submitted forr publication outside of ITC and will probably be copyrighted if accepted for publication. It has been issued as a Technical Report forr early dissemination of its contents. In view of the transfert of copy right too the outside publisher, its distribution outside of ITC priorr to publication should be limited to peer communications and specificc requests. Afte...

متن کامل

Centro per La Ricerca Scientifica E Tecnologica

The Standard Generalized Markup Language SGML and the Extensible Markup Language XML allow authors to better transmit the semantics in their documents by explicitly specifying the relevant structures in a document or class of documents by means of document type de nitions DTDs Several authors proposed to regard DTDs as ex tended context free grammars expressed in a notation similar to extended ...

متن کامل

Sgml and Exceptions Sgml and Exceptions

The Standard Generalized Markup Language (SGML) allows users to deene document type deenitions (DTDs), which are essentially extended context-free grammars in a notation that is similar to extended Backus{Naur form. The right-hand side of a production is called a content model and its semantics can be modiied by exceptions. We give precise deenitions of the semantics of exceptions and prove tha...

متن کامل

Sgml and Exceptions 1 Pekka Kilpell Ainen 2

The Standard Generalized Markup Language (SGML) allows users to deene document type deenitions (DTDs), which are essentially extended context-free grammars in a notation that is similar to extended Backus{Naur form. The right-hand side of a production is called a content model and its semantics can be modiied by exceptions. We give precise deenitions of the semantics of exceptions and prove tha...

متن کامل

SGML and XML Document Grammars and Exceptions

The Standard Generalized Markup Language (SGML) and the Extensible Markup Language (XML) allow users to de ne document type de nitions (DTDs), which are essentially extended context-free grammars expressed in a notation that is similar to extended Backus{Naur form. The right-hand side of a production, called a content model, is both an extended and a restricted regular expression. The semantics...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Syst.

دوره 23  شماره 

صفحات  -

تاریخ انتشار 1998